Estimation of viral richness from shotgun metagenomes using a frequency count approach
نویسندگان
چکیده
BACKGROUND Viruses are important drivers of ecosystem functions, yet little is known about the vast majority of viruses. Viral shotgun metagenomics enables the investigation of broad ecological questions in phage communities. One ecological characteristic is species richness, which is the number of different species in a community. Viruses do not have a phylogenetic marker analogous to the bacterial 16S rRNA gene with which to estimate richness, and so contig spectra are employed to measure the number of virus taxa in a given community. A contig spectrum is generated from a viral shotgun metagenome by assembling the random sequence reads into groups of sequences that overlap (contigs) and counting the number of sequences that group within each contig. Current tools available to analyze contig spectra to estimate phage richness are limited by relying on rank-abundance data. RESULTS We present statistical estimates of virus richness from contig spectra. The program CatchAll (http://www.northeastern.edu/catchall/) was used to analyze contig spectra in terms of frequency count data rather than rank-abundance, thus enabling formal statistical analyses. Also, the influence of potentially spurious low-frequency counts on richness estimates was minimized by two methods, empirical and statistical. The results show greater estimates of viral richness than previous calculations in nearly all environments analyzed, including swine feces and reclaimed fresh water. CONCLUSIONS CatchAll yielded consistent estimates of richness across viral metagenomes from the same or similar environments. Additionally, analysis of pooled viral metagenomes from different environments via mixed contig spectra resulted in greater richness estimates than those of the component metagenomes. Using CatchAll to analyze contig spectra will improve estimations of richness from viral shotgun metagenomes, particularly from large datasets, by providing statistical measures of richness.
منابع مشابه
Metagenomic 16S rDNA Illumina tags are a powerful alternative to amplicon sequencing to explore diversity and structure of microbial communities.
Sequencing of 16S rDNA polymerase chain reaction (PCR) amplicons is the most common approach for investigating environmental prokaryotic diversity, despite the known biases introduced during PCR. Here we show that 16S rDNA fragments derived from Illumina-sequenced environmental metagenomes (mi tags) are a powerful alternative to 16S rDNA amplicons for investigating the taxonomic diversity and s...
متن کاملSoil Metagenomes from Different Pristine Environments of Northwest Argentina
This is the first study to use a high-throughput metagenomic shotgun approach to explore the biosynthetic potential of soil metagenomes from different pristine environments of northwest Argentina. Our data sets characterize these metagenomes and provide information on the possible effect these ecosystems have on their diversity and biosynthetic potential.
متن کاملAb initio gene identification in metagenomic sequences
We describe an algorithm for gene identification in DNA sequences derived from shotgun sequencing of microbial communities. Accurate ab initio gene prediction in a short nucleotide sequence of anonymous origin is hampered by uncertainty in model parameters. While several machine learning approaches could be proposed to bypass this difficulty, one effective method is to estimate parameters from ...
متن کاملHost-Associated and Free-Living Phage Communities Differ Profoundly in Phylogenetic Composition
Phylogenetic profiling has been widely used for comparing bacterial communities, but has so far been impossible to apply to viruses because of the lack of a single marker gene analogous to 16S rRNA. Here we developed a reference tree approach for matching viral sequences and applied it to the largest viral datasets available. The resulting technique, Shotgun UniFrac, was used to compare host-as...
متن کاملDrift Change Point Estimation in the rate and dependence Parameters of Autocorrelated Poisson Count Processes Using MLE Approach: An Application to IP Counts Data
Change point estimation in the area of statistical process control has received considerable attentions in the recent decades because it helps process engineer to identify and remove assignable causes as quickly as possible. On the other hand, improving in measurement systems and data storage, lead to taking observations very close to each other in time and as a result increasing autocorrelatio...
متن کامل